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The Newcomb-Benford law, also known as the first-digit law, gives the probability distribution associated 
with the first digit of a dataset, so that, for example, the first significant digit has a probability of 30.1 % 
of being 1 and 4.58 % of being 9. This law can be extended to the second and next significant digits. This 
article presents an introduction to the discovery of the law, its derivation from the scale invariance property, 
as well as some applications and examples. Additionally, a simple model of a Markov process inspired by scale 
invariance is proposed. Within this model, it is proved that the probability distribution irreversibly converges 
to the Newcomb-Benford law, in analogy to the irreversible evolution toward equilibrium of physical systems 


in thermodynamics and statistical mechanics. 


|. INTRODUCTION 


In the late 19th century, an astronomer and mathe- 
matician visits his institution’s library and consults a ta- 
ble of logarithms to perform certain astronomical calcu- 
lations. As on previous occasions, he is struck by the fact 
that the first pages (those corresponding to numbers that 
start at 1) are much more worn than the last ones (cor- 
responding to numbers that start at 9). Intrigued, this 
time he decides not to overlook the matter. He closes his 
eyes to concentrate, sketches a few calculations on a piece 
of paper, and finally smiles. He has found the answer and 
it turns out to be enormously simple and elegant. 

A little over half a century later, a physicist and elec- 
trical engineer who is unaware of his predecessor’s discov- 
ery, observes the same curious phenomenon on the pages 
of logarithm tables, and arrives at the same conclusion. 
Both have understood that, in a long list of records {rn } 
obtained from measurements or observations, the frac- 
tion pa of records beginning with the significant digit 
d= 1,2,...,9 is not pa = 1/9, as one might naively ex- 
pect, but rather follows a logarithmic law. More specifi- 
cally, 


pa= logo (1+5). d= 12h. 25 9) (1) 
The numeric values of pg are shown in the second column 
of Table [I] We see that the records that start with 1, 2, 
or 3 account for around 60 % of the total, while the other 
six digits must settle for the remaining 40 %. 

Our 19th century character is Simon Newcomb (Fig. 
[) and he published his discovery in a modest two-page 
note The second character is Frank Benford (Fig. 
and he wrote a 22-page article in which, in addition 
to mathematically justifying Eq. (I), he showed its va- 
lidity in the analysis of more than 20000 first digits 
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TABLE I. Probabilities for the first, second, third, and fourth 
significant digits. 


Digit First Second Third Fourth 
d pa pp p. pi 
0 vee 0.119 68 0.101 78 0.100 18 
1 0.301 03 0.113 89 0.101 38 0.100 14 
2 0.176 09 0.108 82 0.100 97 0.100 10 
3 0.124 94 0.104 33 0.100 57 0.100 06 
4 0.096 91 0.100 31 0.100 18 0.100 02 
5 0.079 18 0.096 68 0.099 79 0.099 98 
6 0.066 95 0.093 37 0.099 40 0.099 94 
7 0.057 99 0.090 35 0.099 02 0.099 90 
8 0.051 15 0.087 57 0.098 64 0.099 86 
9 0.045 76 0.084 90 0.098 27 0.099 82 


taken from sources as varied as river areas, populations 
of American cities, physical constants, atomic and molec- 
ular weights, specific heats, numbers taken from newspa- 
pers or Reader’s Digest, postal addresses, ..., and the se- 
ries n+, yn, n?, or n!, among others, with n € [1, 100]. 

With such an overwhelming evidence, it is not surpris- 
ing that Eq. is usually known as Benford’s law (or 
first-digit law), even though it was found nearly sixty 
years earlier by Newcomb. This is one more manifesta- 
tion of the so-called Stigler’s law, according to which no 
scientific discovery is named after the person who dis- 
covered it in the first place. In fact, as Stigler himself 
points out the law that bears his name was actually 
spelled out in a similar way twenty-three years earlier by 
the American sociologist Robert K. Merton. In order not 
to fall completely into Stigler’s law, many authors refer 
to Eq. (I) as Newcomb-Benford’s law and that is the 
convention (by means of the acronym NBL) that we will 
follow in this article. 

While the literature on the NBL in specialized journals 
is vast; and several books on the topic are available 2 
the number of papers in general or science education jour- 
nals is much scarcer-£2 In this paper, apart from revis- 


FIG. 1. Simon Newcomb (1835-1909). 


FIG. 2. Frank Benford (1883-1948). 


iting the NBL and illustrating it with a few examples, 
we construct a Markov-chain model inspired by the in- 
variance property of the NBL under the operation of a 
change of scale. 


The remainder of the paper is organized as follows. 
The connection between the NBL (and some of its gen- 
eralizations) with the property of scale invariance is for- 
mulated in Sec. This is followed by a few examples in 
Sec. The most original part is contained in Sec. 
where our Markov-chain model is proposed and solved. 
Finally, the paper is closed in Sec. with some con- 
cluding remarks. For the interested reader, Appendices 
AHC] contain the most technical and mathematical parts 
of Secs. [H] and [IV] 


FIG. 3. Diagram showing how the first digit changes when 
all the records of a dataset are doubled. 


Il. ORIGIN OF THE LAW 


Often times, when one first speaks to a friend, relative, 
or even a colleague about the NBL, their first reaction is 
usually of skepticism. Why is the first digit not evenly 
distributed among the nine possible values? A simple 
argument shows that, if a robust distribution law exists, 
it cannot be the uniform distribution whatsoever. 

Imagine a long list of river lengths, mountain heights, 
and country surfaces, for example. It is possible that 
the lengths of the rivers are in km, the heights of the 
mountains in m, and the surfaces of countries in km?, 
but they could also be expressed in miles, feet, or acres, 
respectively. Will the distribution pg depend on whether 
we use some units or others, or even if we mix them? It 
seems logical that the answer should be negative; that is, 
the pq distribution should be (statistically) independent 
of the units chosen; in other words, it is expected that the 
pa distribution is invariant under a change of scale. The 
uniform distribution pq = % obviously does not follow 
that property of invariance. Suppose we start from a 
dataset in which all the values of the first digit are equally 
represented. If we multiply all the records in the dataset 
by 2, we can see that those records that started before 
with 1, 2, 3, and 4 then start with either 2 or 3, either 
4 or 5, either 6 or 7, and either 8 or 9, respectively. On 
the other hand, all those that started with 5, 6, 7, 8, or 9 
start now with 1. All those possibilities are schematically 
shown in Fig. Therefore, if pa = $ initially, then 
pi = 3 and p2 + p3 = p4 + ps = pe + pr = ps + po = $ 
after doubling all the records, thus destroying the initial 
uniformity. 

Thus, the most identifying hallmark of the NBL is that 
it must be applied to records that have units or, as New- 
comb himself writes4+ “As natural numbers occur in na- 
ture, they are to be considered as the ratios of quanti- 
ties.” 


While relatively formal mathematical proofs of the 
NBL can be found in the literature42-2 in Appendix [A] 
we present a sketch of a simpler, physicist-style deriva- 
tion of the law by imposing invariance under a change of 
scales 

Equation (i) can be generalized beyond the first digit. 
The probability that the m first digits of a record 
match the ordered string (d1, d2,...,dm), where dı € 
{1,2,...,9} and d; € {0,1,2,...,9} if i > 2, is given by 
(see Appendix [A) 


m zi 
Pdi ,d2,...4dm = 10819 | 1+ = di x 10m) . (2) 


i=1 


As an example, the probability that the first three digits 
of a record form precisely the string (3,1,4) is p3,1,4 = 
logy) (1 + 1/314) = 0.001 38. 

Once we have Pd, j,do,...,dm, We can calculate the prob- 
ability po” that the mth digit is d, regardless of the 
values of the preceding m — 1 digits, just by summing for 
all possible values of those preceding m — 1 digits, 


=> 


d,=1 d2=0 


S Pdy,d2,...,dm—1,d (3) 


dm—1=0 


In Table [I| the law for the first digit, pg, is accompanied 
by the laws for the second, third, and fourth digits, ob- 
tained from Eqs. and (8). As the digit moves to the 
right, the probability distribution becomes less and less 
disparate. 

In Fig. Blwe saw that, when multiplying a dataset {rn } 
by 2, part of the records that started with d = 1, 2,3, 4, 
specifically those that start with the first two digits (d, 0), 
(d,1), (d,2), (d,3), or (d, 4), will start with 2d, while the 
rest will start with 2d + 1. Let us call ag the fraction 
of records that, initially starting with d = 1, 2, 3, 4, start 
with 2d by doubling all the records. Thus, 


4 
ies Deda=o Paid d= 1,2,3,4. (4) 


Pa 


If the dataset fulfills the NBL, then one has 


logio = logio = 
= 2 ~ 0.58496, az= 4 ~ 0.55034, 
logio 2 3 
log10 3 
(5a) 
logioz logio s 
03 = - ~ 0.53584, a4 = 5 ~ 0.527 84. 
log10 3 log10 q 
(5b) 


We will use these values later in Sec. 


Hl. APPLICATIONS AND EXAMPLES 


The applications and verifications of the NBL 
are numerous and cover topics as varied and 
prosaic as the study a the genome-4 atomic-2 
nuclear 46 and particle physics, „astrophysics dă 
quantum correlations 2 toxic emissions == 9. biophysics 24 
medicine 22 2 dynamical systems% distinction of chaos 
from noise?24 statistical physics#2 scientific citations 
tax audits 527 electoraZ® or scientific22 frauds, gross 
domestic product; 32 stock market Ši 9:31 inflation data 32 
climate changed 3 world wide web#4 internet traffic 22 oo 
cial networks 2% É textbook exeneenai image processing 28 
religious. activities A 2 dates of birth a hydrology and 
geology 44 „raginentotion processes == the first letters 
of words% or even COVID-1944 Other examples can 
be seen in the link of Ref. |45 ZĘ! In this section, we will 
present some additional examples. 

Let us start with one of the situations that Ben- 
ford himself studied in his classic paper# city popula- 
tions. Using data from the Spanish National Institute of 
Statistics42 we have considered the 2019 population of 
the 165 municipalities in the province of Badajoz (plus 
the total population of the province of Badajoz), of the 
223 municipalities of the province of Caceres (plus the to- 
tal population of the province of Caceres), and the total 
population of the 388 municipalities of the Spanish re- 
gion of Extremadura, which encompasses the provinces 
of Badajoz and Caceres, (plus the total populations of 
the provinces of Badajoz and Caceres). We have also 
considered the population (according to the 2016 cen- 
sus) of the 8110 Spanish municipalities. For all these 
lists of records, we have analyzed the frequency of those 
starting with d = 1,2,...9 and the results are compared 
in Fig. There is a good general agreement between 
the population data and the NBL predictions. This is 
interesting, since it is not obvious that the distribution 
of the number of inhabitants of municipalities should be 
invariant under a change of scale. 

Let us now turn to two examples from astronomy. In 
the first one, we take the distance from Earth (in light- 
years and in parsecs) to the 300 brightest stars42 In the 
second case, the data are the daily number of sunspots 
from 1818 to the present48 As seen in Fig. [5] the dis- 
tances between our planet and the 300 brightest stars 
agree reasonably well with the NBL, despite the fact that 
the list is not excessively long (only 300 data points) and 
that there are “local” deviations (for example, pg < p7 in 
the two choices of units). This general agreement was to 
be expected, since the distribution of digits in distances 
(which are expressed in units) is a clear example of in- 
variance under a change of scale. However, in the case 
of the daily number of sunspots, quantitative (although 
not qualitative) differences are observed with the NBL, 
especially in the d = 1, 3, 4, and 5 cases. It should 
be noted that, although the series is rather long (more 
than 59000 records, after excluding days without data 
or with zero spots), each record only has one, two, or 
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FIG. 4. Histograms showing the distribution of the first digit 
for (from left to right at each digit d) the NBL and the popu- 
lations of the municipalities of the provinces of Badajoz and 
Caceres, the region of Extremadura, and Spain. 
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FIG. 5. Histograms showing the distribution of the first digit 
for (from left to right at each digit d) the NBL, the distances 
to Earth in light-years and in parsecs from the brightest 300 
stars, and the daily number of sunspots. 


three digits (the maximum number of sunspots was 528 
and corresponded to August 26, 1870), thus producing 
a certain bias to records starting with 1. Moreover, the 
daily number of sunspots cannot be expressed in units 
and therefore may not be scale-invariant. 

Finally, we have analyzed the prices of 1016 items 
from a chain of fashion retailers#2 and of 1373 prod- 
ucts from a chain of hypermarkets22. The results are 
shown in Fig. [6] In this case, the discrepancies with the 
NBL are more pronounced. Although the highest fre- 
quencies occur for d = 1 and d = 2, the observed values 
of pa do not decrease monotonically with increasing d. 
In the case of the fashion retailers, we have p4 > p3 
and pg > pg > pe > p7; in the prices of the chain of 
hypermarkets, pg > pọ > p7 > pe. In principle, one 
might think that, since they can be expressed in euro, 
pound, dollar, peso, ruble, yen, ..., prices should sat- 
isfy the property of invariance under a change of scale 
inherent to the NBL. However, commercial and artificial 
pricing strategies must be superimposed on this invari- 
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FIG. 6. Histograms showing the distribution of the first digit 
for (from left to right at each digit d) the NBL and the prices 
of articles of a chain of fashion retailers and a chain of hyper- 
markets. 


ance, which generates relevant deviations with respect to 
the NBL. 


IV. A SIMPLE MODEL OF A MARKOV CHAIN BASED 
ON THE SCALE INVARIANCE PROPERTY OF THE 
NEWCOMB-BENFORD DISTRIBUTION 


As already said, NBL, Eq. (i), is invariant under a 
change of scale; that is, if we start from a dataset {rn } 
that fulfills the NBL and multiply all the records by a 
constant A, the resulting dataset {Arn} still fulfills the 
NBL. 

It is tempting to conjecture that the NBL should be 
an attractor of this process. This would mean that, 
if we started from an initial dataset {r,,(0)} that does 
not fulfill the NBL and generated new sets {r,,(t)} = 
{\‘r,(0)} at times t = 1,2,... by multiplying each suc- 
cessive dataset by A (other than a fractional power of 
10), the first-digit distribution {pa(t)} of the generated 
sets would converge toward the NBL. However, this is 
not the case. As illustrated in Fig. [Z] for the case \ = 2 
and an initial dataset of 104 random numbers, the time- 
dependent distribution {pg(t)} exhibits a quasiperiodic 
oscillatory pattern around the NBL distribution with- 
out any apparent amplitude attenuation. In fact, since 
210 — 1024 ~ 103, the distribution nearly returns to the 
uniform initial one at times t = 10,20,30,.... This be- 
havior is reminiscent of the Poincaré recurrence time in 
mechanical systems and the associated Zermelo paradox 
about irreversibility 24 

In the transformation {r,(t)} > {rn(t + 1) = 2ra (t)}, 
the first-digit distribution changes, according to Fig. B] 
as 


pi(t +1) = ps(t) + pe(t) + p7(t) + ps(t) + po(t), (6a) 


p2(t +1) = aipı(t), ps(t+1)=(1—a1)pi(t), (6b) 
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FIG. 7. Evolution of the first-digit distribution pa(t) (top 
panel) and of the ratio pa(t)/pq4 (bottom panel, where a verti- 
cal shift d— 1 has been applied for better clarity), {pz} being 
the NBL distribution, when starting from a dataset of 104 
random records uniformly distributed between 0 and 1 and 
doubling the records at each time step. Note the overlap of 
some of the points in the top panel. 


pa(t +1) = œzp2(t), ps(t+1)=(1—a2)pa(t), (6c) 


pelt +1) =asps(t), p(t +1)=(1—a3)ps(t), (6d) 


ps(t+1)=aa4pa(t), po(t+1)=(1—a4)pa(t), (6e) 


where the fractions ag (d = 1,2,3,4) are defined by Eq. 
(4). Note that, in general, the fractions ag depend on 
the distributions of the first digit, pa(t), and of the first 
two digits, pa,d.(t), of the dataset {r,,(t)} [see Eq. (4)]. 
As a consequence, (i) Eqs. (6) do not make a closed set 
and (ii) the parameters ag depend on t. 

At this point, we construct a simplified dynamical 
model in which the four unknown and time-dependent 
parameters aq in Eqs. (6) are replaced by fixed constants. 
In matrix notation, 


p(t +1) =W- ptt), (7) 
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FIG. 8. Evolution of the ratio pa(t)/p} (where a vertical shift 
d—1 has been applied for better clarity), when starting from 
a uniform initial distribution pa(0) = 3, according to our 


Markov-chain model, Eq. (Z). 
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FIG. 9. Evolution of the ratio pa(t)/pq (where a vertical shift 
d—1 has been applied for better clarity), when starting from 
an inverted initial distribution pa(0) = pjp_g, according to 
our Markov-chain model, Eq. (Z). 


where p(t) = (pi(t),po(t),-..,p9(t))' is a column vector 
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FIG. 10. Evolution of the Kullback—Leibler divergence Dpi (t) 
(in logarithmic scale), starting from the uniform initial dis- 
tribution pa(0) = $ and from the inverted initial distribution 
pa(0) = pig—a; according to our Markov-chain model, Eq. (Z). 
The dashed line is proportional to |a2,3|’ (see Appendix [C). 


({ denotes transposition) and 


0 0 0 © L411 

ay 0 0 0 00000 
l-—a, 0 0 0 00000 

0 a2 0 0 00000 

W= O l-a 0 0 00000 
0 0 a3 0 00000 

0 0 I1—ag 0 00000 

0 0 0 a4 00000 

0 0 0 1l-az 00000 


(8) 
is a fixed square matrix. Equation fits the canoni- 
cal description of Markov chains22 where W is the so- 
called transition matrix, and {aq} correspond to transi- 
tion probabilities. In this way, we may forget about the 
meaning of {pa(t)} as the first-digit distribution of the 
dataset {r,,(t)} and look at the numerals 1,2,...,9 as 
labels of nine distinct internal states of a certain physical 
system which follow each other in the prescribed order 
sketched by Fig. 

Note that the matrix W is singular, that is, not in- 
vertible. This implies the irreversible character of the 
transition {pa(t)} —> {pa(t + 1)}. The stationary solu- 
tion p* to Eq. satisfies the condition p* = W.- p*. 
Such a stationary solution will be an attractor of our 
Markov process if lim;-,.. p(t) = p* for any initial con- 
dition p(0). This property, along with the exact solution 
of Eq. (Z), is proved in Appendix [B] If we choose for ag 
the values given by Eqs. (5), the stationary solution coin- 
cides exactly with the NBL, as could be expected. This 
will be the choice made in the rest of this section. 

To illustrate the irreversible evolution of p(t) toward 


p*, we are going to consider two different initial condi- 
tions. First, we start from a uniform distribution, that 
is, pa(0) = 4. The result is shown in Fig. [8] where we 
see that the evolution is oscillatory (see Appendix [B] for 
an explanation) and the oscillations are rapidly damped 
to attain the stationary solution. As a second exam- 
ple, we take an NBL inverted initial distribution, that 
is, pa(0) = pto_g, 50 that, initially, state 9 is the most 
probable and state 1 is the least probable. In this second 
example, as shown in Fig. [9] the initial oscillations are of 
greater amplitude but, as before, the stationary distribu- 
tion is practically reached after a few iterations. Compar- 
ison between Figs.[7Jand[8]shows that the main difference 
between our Markov model and the non-Markovian evo- 
lution of {pa(t)} in a real dataset is that the oscillation 
amplitudes are attenuated in the model and not in the 
original framework. 

It seems convenient to characterize the evolution of the 
set of probabilities {pa(t)} obeying the Markov process 
[Eq. (2)] toward the attractor {p4} by means of a single 
parameter that, in addition, evolves monotonically, thus 
representing the irreversibility of evolution. It is expected 
that these properties are verified by the Kullback—Leibler 
divergence which in our case is defined as 


9 


pada nG A., (9) 


d=1 Pa 


This quantity represents the opposite of the Shannon 
entropy&* of {pq(t)}, except that it is measured with re- 
spect to the stationary distribution {p%}. In the context 
of our Markov-chain model, Dg is related to information 
theory. On the other hand, the mathematical structure 
of Dx can be extended to physical systems, thus provid- 
ing a statistical-mechanical basis to the thermodynamic 
entropy2422 as exemplified below. 

Figure [10] shows the evolution of Dx, (t) for the same 
initial conditions as in Figs. B] and [9] Both cases con- 
firm the desired monotonic evolution of Dx, (t). Also, 
the asymptotic decay Dx, (t) > 0 occurs essentially ex- 
ponentially with a rate independent of the initial proba- 
bility distribution. This is proved in Appendix[C] as well 
as the monotonicity property 


Dxxr(t + 1) < Dx (t), (10) 


with the equality not holding for two successive times 
unless pa(t) = p3, in which case Dg = 0. 

Thus, we can say that the NBL in our Markov model 
plays a role analogous to the equilibrium state in ther- 
modynamics and statistical mechanics. In this analogy, 
the “entropy” of the out-of-equilibrium system would be 
(except for a constant) S = —Dxz, so that S increases 
irreversibly in the evolution toward equilibrium. 

This analogy is put forward in Table[I]] where we com- 
pare a system described by the Markov chain [Eq. (7) 
to a classical dilute gas as an example of a well-known 
physical system. In both cases, a statistical description 
is constructed by introducing the adequate probability 


TABLE II. Analogy between a classical dilute gas (as a prototypical physical system) and a system described by the Markov 
chain, Eq. (Z). In the expression of the Maxwell—Boltzmann distribution fms (U), m is the mass of a molecule, T is the gas 
temperature, and kg is the Boltzmann constant. In the Boltzmann equation, J[v| f(t), f(t)] is the collision operator. 


Dilute gas 
Probability distribution 


Normalization fesse, t)=1 
Of (v,t 
Evolution equation Boltzmann equation: i ) 


Equilibrium state Maxwell—Boltzmann: fmg(v) = ( 


Entropy functional 


ds(t) . 


Irreversibility equation ae = 


distribution: the velocity distribution function (gas) and 
the probability of the internal state d (Markov chain); 
while f(v,t) is continuous in both y and t, pa(t) is dis- 
crete in d and t. The evolution equation for the prob- 
ability distribution function is the Boltzmann equation 
(gas, under the molecular chaos ansatz22) and Eq. (Z) 
(Markov chain). The equilibrium state is characterized 
by the Maxwell-Boltzmann distribution2£ fmg(v) (gas) 
and the NBL distribution p (Markov chain). In both 
classes of systems the nonequilibrium entropy functional 
S(t) can be defined, within a constant, as the opposite of 
the Kullback—Leibler divergence2228 of the equilibrium 
distribution from the time-dependent one. Boltzmann’s 
celebrated H-theorem2&22 (gas) and the result presented 
in Eq. (10) (Markov chain) show that the entropy S(t) 
never decreases, irreversibly evolving toward its equilib- 
rium value. 


V. CONCLUDING REMARKS 


One of the main goals of this article was to show that, 
contrary to what might be initially thought, the first sig- 
nificant digit d of a dataset extracted from measurements 
or observations of the real world is not evenly distributed 
among the nine possible values, but typically the fre- 
quency is higher for d = 1 and decreases as d increases. 
The NBL, Eq. (J), gives a mathematical expression to 
this empirical fact, although it does not always need to 
be rigorously verified due to statistical fluctuations (as 
happens with populations in Fig. Ø] and with distances 
in Fig. 5), limitations of the sample (as in the sunspot 
case in Fig. 5), or an artificial bias (as in the case of 
prices of articles in Fig. [6). It is to be expected that, 
except for unavoidable statistical fluctuations, the law is 
fulfilled in datasets in which quantities are measured in 
units, so that the distribution of the first digit is inde- 
pendent of the units chosen due to its invariance under 
a change of scale. More generally, the NBL is satisfied 


Velocity distribution function: f(v,t) 
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when the mantissa of the logarithms of the considered 
data is uniformly distributed. That makes lists not di- 
rectly related to physical quantities, such as Fibonacci 
numbers or powers of 2, also satisfy the NBL. 


Gaining inspiration from the scale invariance prop- 
erty of the NBL, we have constructed a Markov-chain 
model for a nine-state system that evolves in time ac- 
cording to the scheme depicted in Fig. [B] The initial- 
value model can be exactly solved, the solution show- 
ing an irreversible relaxation toward the NBL probability 
distribution. Moreover, we have proved that the associ- 
ated (relative) entropy monotonically increases, in anal- 
ogy with the second law of thermodynamics in physical 
systems. 


Until the 1970s (which is when scientific pocket calcu- 
lators began to be used), physicists used tables of loga- 
rithms (or their application in slide rules) for small every- 
day scientific calculations. Those calculations are nowa- 
days performed on pocket calculators, cellular phones, or 
personal computers with a wide variety of existing math- 
ematical programs. Since the data that are manipulated 
in physics are extracted from “real” situations, such as 
experiments, models, physical constants, ..., we can con- 
clude, as a tribute to Newcomb and Benford and their 
logarithmic tables, that the keyboard button 1 will be 
the one that presents the greatest wear and tear, while 
that of 9 will be the least used. 
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Appendix A: Derivation of the NBL and some 
generalizations 


Consider a long list of records {r,} that, without 
loss of generality for the matter at hand, we will as- 
sume positive. Each record can be written in the form 
Tn = Zn X 10%», where kn is an integer and æn € [1, 10) 
is the significand. Obviously, it is the distribution of the 
significand that is relevant for the NBL. The significand 
£n is directly related to the mantissa Hn of the deci- 
mal logarithm of rn, that is, logyg Tn = kn + HUn, where 
Hn = logio £n € [0, 1). 


Let P,(a)dx be the probability that the significand 
lies between x and x + dz, so that the normalization 
condition is JE dz P(x) = 1. The probability that the 
first significant digit of any record is d is then given by 
the integral 


d+1 
p= I da P(x). (A1) 


More generally, if we consider an ordered string 
(dı, d2,...,dm) made up of the first m digits, where 
dı € {1,2,...,9} and d; € {0,1,2,...,9} if i > 2, it is 
obvious that the records whose m first digits match the 
string (dı, d2,...,dm) will be those whose significand x 
is greater than or equal to dı + dz x 107! + --- + dm X 
1070-0 and less than dı + dz x 1071 +- - -+ (dm +1) x 
107-1), Consequently, 


1070-0947 d;x10707) 
Pdi,d2,...,dm = dz P, (x). 
om, d;x10-G-) 
(A2) 


If the distribution P,(x) is actually invariant under a 
change of scale, that means that P,(Av) = f(A)P,(2) 
with arbitrary À. Taking into account the normalization 


condition in the form [\°* d(Ax) P, (Ax) = 1, it follows 
that necessarily f(A) = A71, that is, P, (Ax) = A71 P, (x). 
Differentiating both sides of the equation with respect to 
à and then taking A = 1, we easily obtain xP! (x) = 
—P,(x), which, according to Euler’s theorem, implies 
that P(x) is a homogeneous function of degree —1, that 
is, P(x) x £t. The constant of proportionality is ob- 
tained from the normalization condition, finally yielding 


-1 


x 
P,(z)=——, 0< 
(x)=, 0<#<10 (A3) 


This is the unique distribution of significands that is in- 
variant under a change of scale. From Eq. (A3), and 
applying Eqs. and (A2), it is straightforward to ob- 
tain NBL, Eq. (ij, and its generalization, Eq. 2). As a 


consistency test of Eq. (2), it is easy to check that 
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(A4) 


It is interesting to note that the inverse law for the 
significand, Eq. (A3), implies a uniform law for the man- 
tissa (and vice versa). Let P,(u)du be the probability 
that the mantissa lies between u and u + du. Since 
P,,(u)du = P,(x)dz and du = (z~'/1In10)dz, Eq. 
gives us P,,() = 1. In Newcomb’s words “The law of 
probability of the occurrence of numbers is such that all 
mantissee of their logarithms are equally probable.” An 
immediate consequence is that, if u is a random variable 
uniformly distributed between 0 and 1, then the random 
variable x = 10" fulfills the NBL. This is in fact an easy 
way to generate a list of random records matching the 
NBL. 


There are deterministic series that also satisfy the 
NBL. Suppose the series {rn = aa” + b8",n = 1,2,...}, 
where a, b, a, and 8 are real numbers with a > 0, 
a > B > 0, and logoa = irrational+2 In that case, 
limp oo logyg Tn = nlogyya + logjoa has a uniformly 
distributed mantissa, so {rn } satisfies the NBL. That in- 
cludes, for example, the series {2}, {3”}, and {Fn}, 
where Fn = Z |p” — (—y7')”] are the Fibonacci num- 


bers, y = 4 (1 + v5) being the golden ratio. Similarly, 
the series {n!} also verifies the law #2 


Another important property is that if a list {r,,} fulfills 
the NBL, so does the list {r2}, a being a real number. 
Indeed, if logigtn = kn + Hn, the mantissa pun being 
uniformly distributed, then the mantissa of logor = 
a(kn + Hn) is also evenly distributed. This is directly 
related to the fact that the NBL is not only invariant 
under a change of scale but also under base change as 
would be expected, given the artificial character of the 
decimal base. To see it, let us assume a base b and build 
the list {r4 }, with a = log,, b, from a list {rn} that fulfills 
the NBL. In that case, rå = yn x b!» where Un = UAE 
[1, b) is the significand of r£ in the base b. The probability 
distribution P,(y) is related to the distribution P,(x) 
through the equation P,(y)dy = P,(x)dz, so that Eq. 
leads to 


Py(y) = yo 


ieee 
“ae oe 


(A5) 
Therefore, NBL in an arbitrary base b takes the form 


(A6) 


1 
Pa = log, (1+5), d=1,2,...,b-—1. 


Appendix B: Exact solution of the Markov-chain model 


Note first that Eqs. (6) verify the normalization con- 
dition yy palt + 1) 53 palt) = 1. There- 
fore, only eight of the probabilities {pa,d = 1,2,...,9} 
are linearly independent, so we can eliminate one of 
them. If we choose pọ = 1 — yo 1Pa, Eq. gives 
us pı(t + 1) = 1 — pı(t) — p2(t) — p(t) vail). Al- 
though it is not strictly necessary, it is mathemati- 
cally convenient to split the column vector p(t) into 


the subsets p(t) = (pi(t), po(t), pa (t), pa(t))', pult) = 


(ps(t),po(t),pr(t),ps(t))", and po(t). As a consequence, 
the matrix equation (7) can be decomposed into 


pit+1)=q+A-pi(t), pu(t+1)=B-pi(t), (B1) 
where q = (1,0,0,0) and 
—-1 -1 -1 -1 
pal am 9 0 OF (B2s) 
l-a, 0 0 0 
0 ag 0 0 
0 1— a2 0 0 
p- |0 0 œ 0 (B2b) 
0 0 1l—a3 0 
0 0 0 a4 


In this way, one deals with two independent 4 x 4 matrices 
instead of the 9 x 9 transition matrix introduced in Eq. 
(8). Moreover, only the equation for the projected vector 
pr in Eq. needs to be solved. Once the solution is 
obtained, the solution for the complementary projected 
vector pri is straightforward. 

By recursion, it is easy to check that the solution to 
the initial-value problem posed by Eq. is 


t-1 

= S0A"-q+A‘- pi(0) 
n=0 
(l= 


A’) - pt +A’ - pi(0), (B3a) 
pu(t) =B-(I—A**)- pp +B-A**- pi(0), (B3b) 
where | is the identity matrix and 
1 
pi = (I-A)? a= —— a |, (Baa) 
3+ aiaz | 1— a 
aya, 
a(l = a2) 
1 (1— a1 )ag 
Py PY 3ta | =a as) ( ) 
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is the unique stationary solution. From the normalization 
condition, one has pj = a a2(1 — a4)/(3 + ayaz). Note 
that, as seen from Eqs. (B3), the initial values py(0) do 
not influence the evolution of either pr(t) or pr(t). 

The stationary solution will be an attractor if 
lim¢—oo pi(t) = př and limyoo pur(t) = piy for any ini- 
tial condition p1(0). According to Eqs. (B3), this implies 
limis At =). 

To check the above condition, let us obtain the four 
eigenvalues {a;, i = 0,1,2,3} of A. It is easy to see that 
the characteristic equation is a(aja2 + a + a? +a?) = 0. 
Therefore, the eigenvalues are dg = 0 and 


n=- (14-8), (B5a) 
1 1v3 1F1v3 
a2,3 = T3 (: = -g pA N 2 s) > (B5b) 


where 2 is the imaginary unit and 
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(B6) 
Consequently, 
=U. DŻ- UTI, t=1,2,..., (B7) 
where 
0 a? az a2 
aiga Q12 Era 
0 ay a2 a3 
U= | gS (Bijan a 8 
ra Q1Q2 Q13 
1 1 1 1 
00 0 0 
t 
p=|0% "i Say (B9) 
0 0 a 0 
00 0 a$ 


From Eqs. it can be verified that |ai| < |a2,3| < 1 
for 0 < œa < 1, so that limiy a: = = 0 fori = 1,2,3. 
Therefore, i e D! = 0 and hence lim:_,,,A’ = 0. 
This proves the caracter of the stationary distribution 
p* as an attractor of the dynamics. Moreover, since both 
the real eigenvalue (a,) and the real part of the complex 
eigenvalues (az 3) are negative, the evolution of each pa(t) 
to p* is oscillatory, as can be observed in Figs. [B] and 
[9] Note that, in the analysis above, the eigenvalues 0 


(double), 1 — a3, and a4 of the A B are not needed. 

If we choose ag = 4, then Eqs. 2 provide the sta- 
tionary solution pj = $ ~ 0.308, po = p3 = $ œ 0.154, 
pa = ps = pe = Pt = $ ~ 0.077, ps = po = 4 ~ 0.038. 


These values are not ico different from those of the true 
NBL, as can be seen from Table [I| On the other hand, 
the most natural choice is provided by Eqs. (8), in which 
case the stationary solution coincides exactly with the 
NBL. 


Appendix C: Properties of the Kullback—Leibler divergence 
in the Markov model 


The Kullback—Leibler divergence is defined by Eq. (9). 
In order to analyze its asymptotic decay, let us con- 
sider times that are long enough so that the deviations 
dpa(t) = pa(t) — pă can be considered small. In this 
regime, we can expand Eq. (9) in a power series and re- 
tain the dominant terms. The result is 


(Cl) 


On the other hand, for times long enough, |a;|' < |a2,3]° 
(note that |aı| = 0.4261 and |a2.3| = 0.8692), so that, 
according to Eqs. (B3), dpa(t) ~ |a2,3|'. Thus, 


2t 2tl : 
|a2,3|7% = 107410810 la2.3l, 


Dxx(t) ~ (C2) 


This asymptotic behavior is represented in Fig. [0] 
Finally, let us prove Eq. (10). According to Eq. (9), 


t+1 
Dxi(t + 1) =pi(t+ 1) In prt) 
Pi 
4 
t+1 
+5 [pzc +1)In paalt + 1) 
d=1 Poa 


+peati(t + 1) In = . (C3) 
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Evolution equations (6) and the stationarity condition 
p* = W - p* can be rewritten as 


9 9 
mtt) = J pt), pPi= dopa, (Ca) 
d=5 a= 
patt+1) V_} o@ logy, a =1,2,3,4, 
P2a+ı(t + 1) 1—aq 
(C4b) 
a = Oa Pa» d = 1, 2, 3, A. (C4c) 
P2d+1 l- aa 


Inserting Eqs. (C4) into Eq. (C3) one gets 


Dares Par (t) Palt) 
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Given the stationary values {p3,d = 5,...,9}, the dif- 
ference ADxz(t + 1) is a function of the 5 probabili- 
ties {pa(t),d = 5,...,9}. To find the maximum value 
of ADxi(t + 1), we take the derivative 

OAD x(t + 1) 


( — ae palt DA 5 Par 

Opa (t) Dy yes Pa’ (t) 

The unique solution to the extremum conditions 
OAD xz (t + 1)/Opa(t) = 0 is pa(t) = yp} (d = 5,...,9), 
where 0 < y < 1/37}_.p% is arbitrary. In such a case, 
ADxi(t + 1) = 0. To see that this is actually a maxi- 


mum value, suppose, for instance, that pa(t) = 0 except 
if d = do (with dọ = 5,...,9). In that case, it is easy 


to find ADxr(t + 1) = pa (t)ln (ri, E vi) < 0. 
We then conclude that ADxi(t + 1) < 0, i.e., we ob- 
tain Eq. (10), the equality holding only if palt) = yp* 
(d=5,...,9). Note that, even though ADxr(t+ 1) = 0 
if pa(t)/p*, = constant for d = 5,...,9, this proportional- 
ity property is broken down at t + 1 unless y = 1. Asa 
result, ADx,(t + 2) < 0, even though ADx,(t + 1) = 0, 
except in the stationary solution. 
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